BL-Database: A French audiovisual database for speech driven lip animation systems

نویسندگان

  • Yannick Benezeth
  • Frédéric Bimbot
  • Grégoire Bachman
  • Guylaine Le Jan
  • Nathan Souviraà-Labastie
چکیده

The lack of publicly available annotated databases is a major limitation to research advances in speech processing. We describe in this paper an audiovisual speech database which is being made available to the research community. Our database, called BL-database (Blue Lips-database), consists of 238 utterances spoken by 17 speakers. The recordings have been performed during two sessions. The data of the rst session can be used to analyze the 2D movements of the mouth while the data collected by the second session is dedicated to 3D analysis. The audio signal has been phonetically segmented and labeled. Such data is expected to be of great interest to all research groups working on multimodal automatic speech recognition, audio/visual synchronization or speech-driven lip animation. Key-words: 3D audiovisual speech database, speech-driven lip animation. in ria -0 06 14 76 1, v er si on 1 1 Se p 20 11 BL-Database : Une base de données audiovisuelle en Français pour l'animation labiale à partir de ux de parole Résumé : Le manque de bases de données annotées et librement accessibles est une limitation importante aux avancées de la recherche sur le traitement de la parole. Nous décrivons dans cet article une base de données audiovisuelle mise à la disposition de la communauté scienti que. Notre base de données audiovisuelle, appelée BL-database, est composée de 238 phrases prononcées par 17 locuteurs. Les enregistrements se sont déroulés lors de deux sessions. Les données de la première session peuvent être utilisées pour l'étude des mouvements 2D de la bouche d'un locuteur alors que les données de la deuxième session permettent une analyse des mouvements 3D de la bouche. Nous fournissons également, en plus des données audiovisuelles, la transcription phonétique de la base. Ces données peuvent être utilisées pour des travaux de recherche portant sur la reconnaissance de la parole multimodale, la synchronisation audiovisuelle ou sur l'animation labiale à partir de la parole. Mots-clés : base de données audiovisuelle 3D, animation labiale. in ria -0 06 14 76 1, v er si on 1 1 Se p 20 11

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inter-speaker Synchronization in Audiovisual Database for Lip-Readable Speech to Animation Conversion

The present study proposes an inter-speaker audiovisual synchronization method to decrease the speaker dependency of our direct speech to animation conversion system. Our aim is to convert an everyday speaker’s voice to lip-readable facial animation for hearing impaired users. This conversion needs mixed training data: acoustic features from normal speakers coupled with visual features from pro...

متن کامل

Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods

For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...

متن کامل

Image-based Talking Head: Analysis and Synthesis

In this paper, our image-based talking head system is presented, which includes two parts: analysis and synthesis. In the analysis part, a subject reading a predefined corpus is recorded first. The recorded audio-visual data is analyzed in order to create a database containing a large number of normalized mouth images and their related information. The synthesis part generates natural looking t...

متن کامل

Carnival-combining speech technology and computer animation.

Speech is powerful information technology and the basis of human interaction. By emitting streams of buzzing, popping, and hissing noises from our mouths, we transmit thoughts, intentions, and knowledge of the world from one mind to another. We’re accustomed to thinking of speech as an acoustic, auditory phenomenon. However, speech is also visible. Although the primary function of speech is to ...

متن کامل

Developing a Standardized Medical Speech Recognition Database for Reconstructive Hand Surgery

Fast and holistic access to the patients’ clinical record is a major requirement of modern medical decision support systems (DSS). While electronic health records (EHRs) have replaced the traditional paper-based records in most healthcare organization, the data entry into these systems remains largely manual. Speech recognition technology promises substitution of the more convenient speech-base...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011